Term Weighting Method based on Information Gain Ratio for Summarizing Documents Retrieved by IR Systems

نویسندگان

  • Tatsunori Mori
  • Miwa Kikuchi
  • Kazufumi Yoshida
چکیده

This paper proposes a new term weighting method for summarizing documents retrieved by IR systems. Unlike query-biased summarization methods, our method utilizes not the information of query, but the similarity information among original documents by hierarchical clustering. In order to map the similarity structure of the clusters into the weight of each word, we adopt the information gain ratio (IGR) of probabilistic distribution of each word as a term weight. If the amount of information of a word in a cluster increases after the cluster is partitioned into sub-clusters, we may consider that the word contributes to determine the structure of the sub-clusters. The IGR is a measure to express the degree of such contribution. We will show the effectiveness of our method based on the IGR by comparison with other systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic-Sensitive Web Information Retrieval Model for HTML Documents

With the advent of the Internet, a new era of digital information exchange has begun. Currently, the Internet encompasses more than five billion online sites and this number is exponentially increasing every day. Fundamentally, Information Retrieval (IR) is the science and practice of storing documents and retrieving information from within these documents. Mathematically, IR systems are at the...

متن کامل

Document Ranking Method for High Precision Rate

Many information retrieval(IR) systems retrieve relevant documents based on exact matching of keywords between a query and documents. This method degrades precision rate. In order to solve the problem, we collected semantically related words and assigned semantic relationships used in general thesaurus and a special relationship called keyfact term(FT) manually. In addition to the semantic know...

متن کامل

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model.   Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

A novel term weighting scheme based on discrimination power obtained from past retrieval results

Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particula...

متن کامل

Word sense discrimination in information retrieval: A spectral clustering-based approach

Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001